Factor analysis for audio-based video genre classification
نویسندگان
چکیده
Statistical classifiers operate on features that generally include both useful and useless information. These two types of information are difficult to separate in the feature domain. Recently, a new paradigm based on a Latent Factor Analysis (LFA) proposed a model decomposition into usefull and useless components. This method was successfully applied to speaker and language recognition tasks. In this paper, we study the use of LFA for video genre classification by using only the audio channel. We propose a classification method based on short-term cepstral features and Gaussian Mixture Models (GMM) or Support Vector Machine (SVM) classifiers, that are combined with Factor Analysis (FA). Experiments are conducted on a corpus composed of 5 types of video (musics, commercials, cartoons, movies and news). The relative classification error reduction obtained by using the best factor analysis configuration with respect to the baseline system, Gaussian Mixture Model Universal Background Model (GMM-UBM), is about 56%, corresponding to a correct identification rate of about 90%.
منابع مشابه
Robust audio-based classification of video genre
Video genre classification is a challenging task in a global context of fast growing video collections available on the Internet. This paper presents a new method for video genre identification by audio analysis. Our approach relies on the combination of low and high level audio features. We investigate the discriminative capacity of features related to acoustic instability, speaker interactivi...
متن کاملModeling nuisance variabilities with factor analysis for GMM-based audio pattern classification
Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful info...
متن کاملAudio-Visual content description for video genre classification in the context of social media
In this paper we address the automatic video genre classification with descriptors extracted from both, audio (blockbased features) and visual (color and temporal based) modalities. Tests performed on 26 genres from blip.tv media platform prove the potential of these descriptors to this task.
متن کاملContent-Based Video Description for Automatic Video Genre Categorization
In this paper, we propose an audio-visual approach to video genre categorization. It exploits audio, color, temporal and contour information, which are in general genre specific. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At temporal level, we asses action contents with respect to human perception. Further, color perception is...
متن کاملVideo genre categorization and representation using audio-visual information
We propose an audio-visual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at blocklevel, which has the advantage of capturing local temporal information. At the temporal structure level, we consider action content in relation to human perception. Color perception is quantified using st...
متن کامل